Mixture-model cluster analysis using information theoretical criteria
نویسندگان
چکیده
The estimation of mixture models has been proposed for quite some time as an approach for cluster analysis. Several variants of the Expectation-Maximization algorithm are currently available for this purpose. Estimation of mixture models simultaneously allows the determination of the number of clusters and yields distributional parameters for clustering base variables. There are several information criteria that help to support the selection of a particular model or clustering structure. However, a question remains concerning the selection of specific criteria that may be more suitable for particular applications. In the present work we analyze the relationship between the performance of information criteria and the type of measurement of clustering variables. In order to study this relationship we perform the analysis of forty-two data sets with known clustering structure and with clustering variables that are categorical, continuous and mixed type. We then compare eleven information-based criteria in their ability to recover the data sets’ clustering structures. As a result, we select AIC3, BIC and ICL-BIC criteria as the best candidates for model selection that refers to models with categorical, continuous and mixed type clustering variables, respectively.
منابع مشابه
A Comparison of Information Criteria in Clustering Based on Mixture of Multivariate Normal Distributions
Clustering analysis based on a mixture of multivariate normal distributions is commonly used in the clustering of multidimensional data sets. Model selection is one of the most important problems in mixture cluster analysis based on the mixture of multivariate normal distributions. Model selection involves the determination of the number of components (clusters) and the selection of an appropri...
متن کاملOn the Performance of Information Criteria in Latent Segment Models
Nevertheless the widespread application of finite mixture models in segmentation, finite mixture model selection is still an important issue. In fact, the selection of an adequate number of segments is a key issue in deriving latent segments structures and it is desirable that the selection criteria used for this end are effective. In order to select among several information criteria, which ma...
متن کاملFuzzy Cluster Validation Using the Partition Negentropy Criterion
We introduce the Partition Negentropy Criterion (PNC) for cluster validation. It is a cluster validity index that rewards the average normality of the clusters, measured by means of the negentropy, and penalizes the overlap, measured by the partition entropy. The PNC is aimed at finding well separated clusters whose shape is approximately Gaussian. We use the new index to validate fuzzy partiti...
متن کاملSegmentation of Brain MR Images based on Finite Skew Gaussian Mixture Model with Fuzzy C-Means Clustering and EM Algorithm
Segmentation is a process of converting inhomogeneous data into homogeneous data. There are many segmentation techniques available inthe literature. Among these techniques, finite Gaussian Mixture Model using EM algorithm is one mostly used. However, Gaussian Mixture Model is suited well when the image under consideration is symmetric. But in reality, medical images are asymmetric. Hence, it is...
متن کاملA variational Bayesian mixture modelling framework for cluster analysis of gene-expression data
MOTIVATION Accurate subcategorization of tumour types through gene-expression profiling requires analytical techniques that estimate the number of categories or clusters rigorously and reliably. Parametric mixture modelling provides a natural setting to address this problem. RESULTS We compare a criterion for model selection that is derived from a variational Bayesian framework with a popular...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Intell. Data Anal.
دوره 11 شماره
صفحات -
تاریخ انتشار 2007